Apache VXQuery: A Scalable XQuery Implementation

نویسندگان

E. Preston Carman

Till Westmann

Vinayak R. Borkar

Michael J. Carey

Vassilis J. Tsotras

چکیده

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks – Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these two frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation using a real 500GB dataset with various selection, aggregation and join XML queries shows that Apache VXQuery performs well both in terms of scaleup and speed-up. Our experiments show that it is about 3x faster than Saxon (an open-source and commercial XQuery processor) on a 4-core, single node implementation, and around 2.5x faster than Apache MRQL (a MapReduce-based parallel query processor) on an eight (4-core) node cluster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Parallel and Scalable Processor for JSON Data

Increasing interest in JSON data has created a need for its efficient processing. Although JSON is a simple data exchange format, its querying is not always effective, especially in the case of large repositories of data. This work aims to integrate the JSONiq extension to the XQuery language specification into an existing query processor (Apache VXQuery) to enable it to query JSON data in para...

متن کامل

A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits

We are in the era of Big Data and cluster computing. Data sizes have been growing at an exponential rate. At the same time, growth in computing power has been stagnating due to physical limits in processor technology. The only cost effective way to keep up with the growing data trend has been to harness multiple commodity computers in a shared-nothing configuration. Google, needing to manage ex...

متن کامل

Having a ChuQL at XML on the Cloud

MapReduce/Hadoop has gained acceptance as a framework to process, transform, integrate, and analyze massive amounts of Web data on the Cloud. The MapReduce model (simple, fault tolerant, data parallelism on elastic clouds of commodity servers) is also attractive for processing enterprise and scientific data. Despite XML ubiquity, there is yet little support for XML processing on top of MapReduc...

متن کامل

Pushing XML Main Memory Databases to their Limits

The wide distribution of XML documents and the standardization of the Query languages XPath and XQuery have led to a wide variation of XML database implementations. Yet the efficient processing of really large XML documents is still supported by just a few products such as e.g. MonetDB/XQuery as open-source solution [1] or X-Hive as commercial product [2]. Following the main memory and relation...

متن کامل

Cardinality-Aware Purely Relational XQuery Processor

Recently, the use of XML continues to grow in popularity, large repositories of XML documents are going to emerge, and users are likely to pose increasingly more complex queries on these data sets. In 2001 XQuery is decided by the World Wide Web Consortium (W3C) as the standard XML query language. In this article, we describe the design and implementation of an efficient and scalable purely rel...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1504.00331 شماره

صفحات -

تاریخ انتشار 2015

Apache VXQuery: A Scalable XQuery Implementation

نویسندگان

چکیده

منابع مشابه

A Parallel and Scalable Processor for JSON Data

A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits

Having a ChuQL at XML on the Cloud

Pushing XML Main Memory Databases to their Limits

Cardinality-Aware Purely Relational XQuery Processor

عنوان ژورنال:

اشتراک گذاری